Evolving Perl code for protein secondary structure prediction

نویسنده

Bob MacCallum

چکیده

Progress in the area of secondary structure prediction has been frustratingly slow[6]. The most accurate predictors at the moment are trained to predict one of three secondary structural states (helix, strand or coil) for each residue at position i using sequence information from a “window” of residues i ± 7. Information from more distant sequence positions should improve predictions further, since it is assumed that non-local interactions, like those that occur in sheet formation, can modulate the innate secondary structure preferences of a residue and its near neighbours. However, simply using a larger window does not help. First, the information content decreases rapidly as one moves away from i; because the likelihood that these residues are close in 3D space is also diminishing. Second, there is the problem of a using a fixed window to capture information from variable length secondary structures. Recently, attempts have been made to incorporate non-local information in secondary structure predictions. Baldi and coworkers[5] have used recurrent multi-pass neural networks and have shown that information from residues i±15 influences their predictions. Bystroff and coworkers[2] have taken another approach, which is to combine local predictors for secondary and supersecondary structures into a single large hidden Markov model which simultaneously takes into account context effects throughout the sequence. The more successful ab initio 3D predictors, such as Rosetta[1], may also have the side effect of producing more accurate secondary structure predictions. Unfortunately, these three approaches have not yet been shown to be superior to the established predictors in terms of percent predicted correctly into helix, strand or coil (Q3). The best Q3 currently stands at around 76%[4], and any future improvement on this will be a strong indicator of the successful incorporation of long-range and/or folding information into the predictors. This paper describes another attempt to increase secondary structure prediction accuracy using long-range information. The main assumption made in this work is that some form of computer program exists, at least in theory, which can do this. Such a program might mimic the folding dynamics in some way, perhaps in one, two or three dimensions using a reduced complexity model. For example, a predictor could have a simple rule: “predict weak-strand-region as strand if number-of-already-assigned-strands ≥ 2”. This rule could, for example, be applied after assigning “strong-strand-regions”, but before assigning regions with helical sequence patterns. The rationale here is that strands might be more likely to form in the context of an already forming sheet.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prediction of Secondary Structure of Citrus Viroids Reported from Southern Iran

Abstract Viroids are smallest, single-stranded, circular, highly structured plant pathogenic RNAs that do not code for any protein. Viroids belong to two families, the Avsunviroidae and the Pospiviroidae. Members of the Pospiviroidae family adopt a rod-like secondary structure. In this study the most stable secondary structures of citrus viroid variants that reported from Fars province wer...

متن کامل

Protein Secondary Structure Prediction: a Literature Review with Focus on Machine Learning Approaches

DNA sequence, containing all genetic traits is not a functional entity. Instead, it transfers to protein sequences by transcription and translation processes. This protein sequence takes on a 3D structure later, which is a functional unit and can manage biological interactions using the information encoded in DNA. Every life process one can figure is undertaken by proteins with specific functio...

متن کامل

An Algorithmic Framework for the Study of Behavior of siRNA Sequences

The study about biological sequences is gaining momentum nowadays. An increasing number of researchers have proposed framework for the implementation of various algorithms for biomolecules sequence alignment and secondary structure prediction. A comparative study can also enhance the results but alignment and prediction algorithms vary widely in terms of both sensitivity and selectivity across ...

متن کامل

Prediction of Protein Secondary Structure Using Genetic Programming

Certificate This is to certify that, Varun Aggarwal, (104/ECE/2000) a student of NSIT, Delhi, India did his summer training under me at Stockholm Bioinformatics Center for the months of June-July 2003. He worked on two projects documented in this report. Acknowledgement I will like to thanks Dr. Bob MacCallum for giving me this opportunity to work with his group. I hugely benefited and wish to ...

متن کامل

Physicochemical Position-Dependent Properties in the Protein Secondary Structures

Background: Establishing theories for designing arbitrary protein structures is complicated and depends on understanding the principles for protein folding, which is affected by applied features. Computer algorithms can reach high precision and stability in computationally designing enzymes and binders by applying informative features obtained from natural structures. Methods: In this study, a ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2002

Evolving Perl code for protein secondary structure prediction

نویسنده

چکیده

منابع مشابه

Prediction of Secondary Structure of Citrus Viroids Reported from Southern Iran

Protein Secondary Structure Prediction: a Literature Review with Focus on Machine Learning Approaches

An Algorithmic Framework for the Study of Behavior of siRNA Sequences

Prediction of Protein Secondary Structure Using Genetic Programming

Physicochemical Position-Dependent Properties in the Protein Secondary Structures

عنوان ژورنال:

اشتراک گذاری